122 research outputs found

    Algorithm and Hardware Design of Discrete-Time Spiking Neural Networks Based on Back Propagation with Binary Activations

    Full text link
    We present a new back propagation based training algorithm for discrete-time spiking neural networks (SNN). Inspired by recent deep learning algorithms on binarized neural networks, binary activation with a straight-through gradient estimator is used to model the leaky integrate-fire spiking neuron, overcoming the difficulty in training SNNs using back propagation. Two SNN training algorithms are proposed: (1) SNN with discontinuous integration, which is suitable for rate-coded input spikes, and (2) SNN with continuous integration, which is more general and can handle input spikes with temporal information. Neuromorphic hardware designed in 40nm CMOS exploits the spike sparsity and demonstrates high classification accuracy (>98% on MNIST) and low energy (48.4-773 nJ/image).Comment: 2017 IEEE Biomedical Circuits and Systems (BioCAS

    Sonic Millip3De with Dynamic Receive Focusing and Apodization Optimization

    Get PDF
    Abstract-3D ultrasound is becoming common for noninvasive medical imaging because of its accuracy, safety, and ease of use. However, the extreme computational requirements (and associated power requirements) of image formation for a large 3D system have, to date, precluded hand-held 3D-capable devices. Sonic Millip3De is a recently proposed hardware design that leverages modern computer architecture techniques, such as 3D die stacking, massive parallelism, and streaming data flow, to enable high-resolution synthetic aperture 3D ultrasound imaging in a single, low-power chip. In this paper, we enhance Sonic Millip3De with a new virtual source firing sequence and dynamic receive focusing scheme to optimize receive apertures in multiple depth focal zones. These enhancements further reduce power requirements while maintaining image quality over a large depth range. We present image quality analysis using Field II simulations of cysts in tissue at varying depths to show that our methods do not degrade CNR relative to an ideal system with no power constraints. Then, using RTL-level design for an industrial 45nm ASIC process, we demonstrate 3D synthetic aperture with 120x88 transducer array within a 15W fullsystem power budget (400x less than a conventional DSP solution). We project that continued semicondutor scaling will enable a sub-5W power budget in 16nm technology

    FPGA ARCHITECTURE FOR 2D DISCRETE FOURIER TRANSFORM BASED ON 2D DECOMPOSITION FOR LARGE-SIZED DATA

    Get PDF
    ABSTRACT Applications based on Discrete Fourier Transforms (DFT) are extensively used in various areas of signal and digital image processing. Of particular interest is the two-dimensional (2D) DFT which is more computation-and bandwidth-intensive than the one-dimensional (ID) DFT. Traditionally, a 2D DFT is computed using Row-Column (RC) decomposition, where ID DFTs are computed along the rows followed by ID DFTs along the columns. Both application specific and reconfigurable hardware have been used for high-performance implementations of 2D DFT. However, architectures based on RC decomposition are not efficient for large input size data due to memory bandwidth constraints. In this paper, we propose an efficient architecture to implement the 2D DFT for largesized input data based on a novel 2D decomposition algorithm. This architecture achieves very high throughput by exploiting the inherent parallelism due to the algorithm decomposition and by utilizing the row-wise burst access pattern of the external memory. A high throughput memory interface has been designed to enable maximum utilization of the memory bandwidth. In addition, an automatic system generator is provided for mapping this architecture onto a reconfigurable platform of Xilinx Virtex5 devices. For a 2K x 2K input size, the proposed architecture is 1.96x times faster than RC decomposition based implementation under the same memory constraints, and also outperforms other existing implementations

    End-to-End Benchmarking of Chiplet-Based In-Memory Computing

    Get PDF
    In-memory computing (IMC)-based hardware reduces latency and energy consumption for compute-intensive machine learning (ML) applications. Several SRAM/RRAM-based IMC hardware architectures to accelerate ML applications have been proposed in the literature. However, crossbar-based IMC hardware poses several design challenges. We first discuss the different ML algorithms recently adopted in the literature. We then discuss the hardware implications of ML algorithms. Next, we elucidate the need for IMC architecture and the different components within a conventional IMC architecture. After that, we introduce the need for 2.5D or chiplet-based architectures. We then discuss the different benchmarking simulators proposed for monolithic IMC architectures. Finally, we describe an end-to-end chiplet-based IMC benchmarking simulator, SIAM
    corecore